Welcome to the ATFS (Alliance for Tropical Research Science) data
harmonization app!
1 Intro
The app is a tool meant to be used by 2 or more networks that are
planing on combining their data for a common analysis.
1.1 Profiles
The app relies on “Profiles” that indicate how the data is stored in
the file(s) provided: names of columns storing the DBH, the census ID,
the tree tag, units of measurements etc…
A profile is a .rds file that is downloaded via the app once all the
information about the data has been provided in the Headers and Units tab of the app.
One same profile can be uploaded as “input profile” in the Headers and Units tab, to speed up the
process once your network’s data has been profiled, and/or as “output
profile” in the Output format
tab, to transform other networks’ data into that profile.
Some networks have their profile stored within the app.
1.2 Getting your data
ready
The app only accepts CSV files.
It performs best if all the information that you want to share is
collated into one analytical file, so we recommend that you append your
species and plot information to your measurement information beforehand,
and upload that one bigger file into the app.
That said, you can decide to utilize the app to do exactly that.
There is no limit to the number of files you can upload but they all
need to connect to each other in one way or another, so that by a
stacking and/or merging them, it is possible to collate them down to one
file. We will get to this in more detail in a moment.
The app also relied on tidy
data, which means that every column is a variable, every row is an
observation and every cell is a single value. For example, a data set
with multiple column for the DBH measurement (e.g. DBH_2015, DBH_2020
etc…) is not a tidy data set. Instead, there should be a column for the
variable year (which, in our example, will take a value of
2015 or 2020), and a column for DBH. If your data is not in
a tidy format, the Tidy table tab
will help you reshape your data.
We recommend to run the app on your local machine (via R and RStudio)
if one of the following cases apply to you:
You have poor internet connection
You are working with large data files
You are familiar with the development of Shiny apps and would like
to troubleshoot any issues you may encounter yourself
To open the app in R, you will need to install the DataHarmonization
R package and launch Shiny with the following lines of code.
# install the R package
devtools::install_github("Alliance-for-Tropical-Forest-Science/DataHarmonization", build_vignettes = TRUE)
# run the app
shiny::runGitHub( "Alliance-for-Tropical-Forest-Science/DataHarmonization", subdir = "inst/app")
Note that you may need to install devtools package first
and that installing the DataHarmonization R package may ask you to
update a list packages.
You’ll want to re-install the package every once in a while,
to get the latest version of the app.
1.4.2 Running the app
online
If you don’t have R and RStudio and if your data is not too big, you
can choose to run the online version of the app by clicking on this link. Note
that online version may be lagging behind the GitHub version.
2 Interacting with the
app
Once the app is launched you can start interacting with it.
There are multiple tabs to go through. Some tabs will be skipped
automatically if they don’t apply to your situation and you may skip
others if you don’t need/want them.
When you land on a tab, always advance with an action button
(even if skipping) so your inputs are taken into account. You
may use the navigation panel to return to a previous tab but remember to
click on an action button to save your updated entries.
2.1 Upload your
file(s)
This tab starts with information that we already covered in the intro. The checklist is only a guideline to help you
getting ready, and you don’t actually need to check the boxes to keep
going.
The numbered tasks are the elements that you do need to complete to
be able to move forward.
Indicate how many tables you wish to upload
Indicate the finest level of measurement in your data:
Plot: if your data only consists of plot level measurements like
species richness, total basal area, total number of stems etc…
Species: if your data consists of species level measurements like
abundance, basal area etc… This does not prevent you from uploading plot
level information if, e.g. the area of the plots in which you measured
species-level abundance are stored in a separate file.
Tree: if your data consists of tree diameters, circumference,… and
you are only measuring the main stem of each tree. This does not prevent
you from uploading plot and species level information if, e.g. the area
of the plots in which you measured your trees, and the Latin names of
the species they belonged too are stored in a separate file.
stem: if your data consists of stem diameters, circumference,… and
you may have multiple stems belonging to a same tree. This does not
prevent you from uploading plot and species level information if,
e.g. the area of the plots in which you measured your trees, and the
Latin names of the species they belonged too are stored in a separate
file.
Again, even if you are uploading plot level information, if you have
a stem level data, you should upload that file along and indicate that
your level of measurement is “Stem”.
Upload you tables. You’ll have as meany upload boxes as you
indicated needing in step 1. For each of them:
click on Browse... and navigate to the csv file you
want to upload
Type a more meaningful name to replace the generic “Table1”,
“Table2” etc… This is particularly useful if you are uploading more than
one fileh
check on the right hand side that the columns and rows of your data
are rendering properly.
In the unlickely event that your tables are not rendering properly,
adjust the parameters (separator and header) by clicking on the little
gear icon
.
Click on SUBMIT to proceed to the next step.
2.2 Stack tables
If you uploaded more than one table, you will be prompted to the
Stack tables tab, but thid tab will be skipped if you only
uploaded one table.
You will need to stack 2 or more tables if you are collecting the
same information in multiple files. This can be the case if, for
example, you are keeping your measurements from different plots in
different files. Or you are keeping one file per census.
It is important that the files you are stacking have the same
set of columns.